Pitcher Precision

Sasank Vishnubhatla

4/17/2019

Last Update: 2019-05-03 10:16:34

Libraries

Let’s load some libraries in first.

library(baseballr)
library(pitchRx)
library(tidyverse)

Let’s also clean out environment.

rm(list = ls())

With these libraries, we can get out data as well as visaulize it. Let’s take a look at some players to see what we can look at.

Data Loading

Here are the list of players I will be looking at.

  • Noah Syndergaard - Player ID: 592789
  • Patrick Corbin - Player ID: 571578
  • Felipe Vazquez - Player ID: 553878
  • Marcus Stroman - Player ID: 573186
  • Justin Verlander - Player ID: 434378
  • Blake Treinen - Player ID: 595014

Let’s now scrape the data for each player.

scrape.data = function(start, id) {
    data = scrape_statcast_savant(start_date = start,
                                  end_date = format(Sys.time(), "%Y-%m-%d"),
                                  playerid = id,
                                  player_type = 'pitcher')
    data
}

start = "2019-01-01"

syndergaard.data = scrape.data(start, 592789)
corbin.data = scrape.data(start, 571578)
vazquez.data = scrape.data(start, 553878)
stroman.data = scrape.data(start, 573186)
verlander.data = scrape.data(start, 434378)
treinen.data = scrape.data(start, 595014)

Now with our data, let’s get the information we want out of it.

filter.data = function(data) {
    filtered = data.frame(name = data %>% pull(player_name),
                          pitch = data %>% pull(pitch_type),
                          outcome = data %>% pull(type),
                          date = data %>% pull(game_date),
                          event = data %>% pull(events),
                          descrip = data %>% pull(description),
                          xcoord = data %>% pull(plate_x),
                          ycoord = data %>% pull(plate_z),
                          xmove = data %>% pull(pfx_x),
                          ymove = data %>% pull(pfx_z),
                          velo = data %>% pull(effective_speed),
                          spin = data %>% pull(release_spin_rate),
                          exvelo = data %>% pull(launch_speed),
                          exang = data %>% pull(launch_angle),
                          year = substring(data %>% pull(game_date), 0, 4))
    filtered
}

syndergaard = filter.data(syndergaard.data)
corbin = filter.data(corbin.data)
stroman = filter.data(stroman.data)
treinen = filter.data(treinen.data)
vazquez = filter.data(vazquez.data)
verlander = filter.data(verlander.data)

With this filtered data, we have selected the following columns:

  • Name
  • Pitch type
  • Pitch outcome (Strike, Ball, In play)
  • Game Date
  • Event
  • Description of event
  • X coordinate (horizontal location) of pitch
  • Y coordinate (vertical location) of pitch
  • Horizontal movement (X coordinate movement) of pitch
  • Vertical movement (Y coordinate movement) of pitch
  • Velocity of pitch
  • Spin rate of pitch
  • Exit velocity of pitch
  • Launch angle of pitch

Visualization

Let’s start visualizing some of this data. Before that, let me define a strikezone. This strikezone was taken from the website Baseball with R

topKzone = 3.5
botKzone = 1.6
inKzone = -.95
outKzone = 0.95
kZone = data.frame(x = c(inKzone, inKzone, outKzone, outKzone, inKzone),
                   y = c(botKzone, topKzone, topKzone, botKzone, botKzone))

Pitch Location via Outcome

Let’s look at pitch location with if the pitch is a ball or strike. We know X is hit into play, B is ball, and S is any type of strike.

graph.pitch.heatmap.out = function(player) {
    graph = ggplot(player) +
        geom_jitter(aes(x = player$xcoord,
                        y = player$ycoord,
                        color = player$outcome)) +
        xlab("Horizontal Position") +
        ylab("Vertical Position") +
        ggtitle(paste(player$name[1], "Heatmap", sep = " ")) +
        labs(color = "Pitch Outcome") +
        theme_minimal() + geom_path(aes(x, y), data = kZone)
    graph
}

corbin.heatmap.out = graph.pitch.heatmap.out(corbin)
corbin.heatmap.out

stroman.heatmap.out = graph.pitch.heatmap.out(stroman)
stroman.heatmap.out

syndergaard.heatmap.out = graph.pitch.heatmap.out(syndergaard)
syndergaard.heatmap.out
## Warning: Removed 1 rows containing missing values (geom_point).

treinen.heatmap.out = graph.pitch.heatmap.out(treinen)
treinen.heatmap.out
## Warning: Removed 23 rows containing missing values (geom_point).

vazquez.heatmap.out = graph.pitch.heatmap.out(vazquez)
vazquez.heatmap.out

verlander.heatmap.out = graph.pitch.heatmap.out(verlander)
verlander.heatmap.out

Pitch Location via Pitch Type

Let’s look at pitch location via pitch type.

graph.pitch.heatmap.type = function(player) {
    graph = ggplot(player) +
        geom_jitter(aes(x = player$xcoord,
                        y = player$ycoord,
                        color = player$pitch)) +
        xlab("Horizontal Position") +
        ylab("Vertical Position") +
        ggtitle(paste(player$name[1], "Heatmap", sep = " ")) +
        labs(color = "Pitch Type") +
        theme_minimal() + geom_path(aes(x, y), data = kZone)
    graph
}

corbin.heatmap.type = graph.pitch.heatmap.type(corbin)
corbin.heatmap.type

stroman.heatmap.type = graph.pitch.heatmap.type(stroman)
stroman.heatmap.type

syndergaard.heatmap.type = graph.pitch.heatmap.type(syndergaard)
syndergaard.heatmap.type
## Warning: Removed 1 rows containing missing values (geom_point).

treinen.heatmap.type = graph.pitch.heatmap.type(treinen)
treinen.heatmap.type
## Warning: Removed 23 rows containing missing values (geom_point).

vazquez.heatmap.type = graph.pitch.heatmap.type(vazquez)
vazquez.heatmap.type

verlander.heatmap.type = graph.pitch.heatmap.type(verlander)
verlander.heatmap.type

Pitch Location via Velocity

Let’s look at pitch location via velocity.

graph.pitch.heatmap.velo = function(player) {
    graph = ggplot(player) +
        geom_jitter(aes(x = player$xcoord,
                        y = player$ycoord,
                        color = player$velo)) +
        xlab("Horizontal Position") +
        ylab("Vertical Position") +
        ggtitle(paste(player$name[1], "Heatmap", sep = " ")) +
        labs(color = "Velocity") +
        scale_color_gradient(low = "blue", high = "red") +
        theme_minimal() + geom_path(aes(x, y), data = kZone)
    graph
}

corbin.heatmap.velo = graph.pitch.heatmap.velo(corbin)
corbin.heatmap.velo

stroman.heatmap.velo = graph.pitch.heatmap.velo(stroman)
stroman.heatmap.velo

syndergaard.heatmap.velo = graph.pitch.heatmap.velo(syndergaard)
syndergaard.heatmap.velo
## Warning: Removed 1 rows containing missing values (geom_point).

treinen.heatmap.velo = graph.pitch.heatmap.velo(treinen)
treinen.heatmap.velo
## Warning: Removed 23 rows containing missing values (geom_point).

vazquez.heatmap.velo = graph.pitch.heatmap.velo(vazquez)
vazquez.heatmap.velo

verlander.heatmap.velo = graph.pitch.heatmap.velo(verlander)
verlander.heatmap.velo

Pitch Movement

To view the movement, let’s just determine the average movement for each type of pitch that each player has. First let’s make a few helpful functions for us.

graph.pitch.xmovement = function(player) {
    graph = ggplot(player) +
        geom_boxplot(aes(x = player$pitch,
                         y = player$xmove,
                         color = player$pitch)) +
        coord_flip() +
        labs(color = "Pitch Type") +
        xlab("Pitch Type") + ylab("Horizontal Movement") +
        ggtitle(paste(player$name[1], "Horizontal Movement", sep = " ")) +
        theme_minimal()
}

graph.pitch.ymovement = function(player) {
    graph = ggplot(player) +
        geom_boxplot(aes(x = player$pitch,
                         y = player$ymove,
                         color = player$pitch)) +
        labs(color = "Pitch Type") +
        xlab("Pitch Type") + ylab("Vertical Movement") +
        ggtitle(paste(player$name[1], "Vertical Movement", sep = " ")) +
        theme_minimal()
}

Patrick Corbin

corbin.xmove = graph.pitch.xmovement(corbin)
corbin.ymove = graph.pitch.ymovement(corbin)
corbin.xmove

corbin.ymove

Marcus Stroman

stroman.xmove = graph.pitch.xmovement(stroman)
stroman.ymove = graph.pitch.ymovement(stroman)
stroman.xmove

stroman.ymove

Noah Syndergaard

syndergaard.xmove = graph.pitch.xmovement(syndergaard)
syndergaard.ymove = graph.pitch.ymovement(syndergaard)
syndergaard.xmove
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

syndergaard.ymove
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

Blake Treinen

treinen.xmove = graph.pitch.xmovement(treinen)
treinen.ymove = graph.pitch.ymovement(treinen)
treinen.xmove
## Warning: Removed 23 rows containing non-finite values (stat_boxplot).

treinen.ymove
## Warning: Removed 23 rows containing non-finite values (stat_boxplot).

Felipe Vazquez

vazquez.xmove = graph.pitch.xmovement(vazquez)
vazquez.ymove = graph.pitch.ymovement(vazquez)
vazquez.xmove

vazquez.ymove

Justin Verlander

verlander.xmove = graph.pitch.xmovement(verlander)
verlander.ymove = graph.pitch.ymovement(verlander)
verlander.xmove

verlander.ymove

Pitch Velocity

We need to separate each pitch first by type. Then we can see how the pitch’s velocity changed over time.

graph.pitch.velo = function(player) {
    graph = ggplot(player) +
        geom_line(aes(x = 1:length(player$velo),
                      y = player$velo,
                      color = player$pitch)) +
        xlab("Pitches Thrown") + ylab("Velocity") + labs(color = "Pitch Type") +
        ggtitle(paste(player$name[1], "Pitch Velocity Chart", sep = " ")) +
        theme_minimal()
}

corbin.velo = graph.pitch.velo(corbin)
corbin.velo

stroman.velo = graph.pitch.velo(stroman)
stroman.velo

syndergaard.velo = graph.pitch.velo(syndergaard)
syndergaard.velo
## Warning: Removed 1 rows containing missing values (geom_path).

treinen.velo = graph.pitch.velo(treinen)
treinen.velo
## Warning: Removed 23 rows containing missing values (geom_path).

vazquez.velo = graph.pitch.velo(vazquez)
vazquez.velo

verlander.velo = graph.pitch.velo(verlander)
verlander.velo

Pitch Spin Rate

graph.pitch.spin = function(player) {
    graph = ggplot(player) +
        geom_step(aes(x = 1:length(player$spin),
                      y = player$spin,
                      color = player$pitch),
                  direction = "vh") +
        xlab("Pitches Thrown") + ylab("Spin Rate") + labs(color = "Pitch Type") +
        ggtitle(paste(player$name[1], "Pitch Spin Rate Chart", sep = " ")) +
        theme_minimal()
}

corbin.spin = graph.pitch.spin(corbin)
corbin.spin

stroman.spin = graph.pitch.spin(stroman)
stroman.spin

syndergaard.spin = graph.pitch.spin(syndergaard)
syndergaard.spin
## Warning: Removed 1 rows containing missing values (geom_path).

treinen.spin = graph.pitch.spin(treinen)
treinen.spin
## Warning: Removed 23 rows containing missing values (geom_path).

vazquez.spin = graph.pitch.spin(vazquez)
vazquez.spin

verlander.spin = graph.pitch.spin(verlander)
verlander.spin

Analysis

I’ll be looking at a few specific Pittsburgh Pirates pitchers and looking at them from year to year.

Jameson Taillon

Let’s first read in our data for Taillon.

taillon.data = scrape.data("2018-01-01", 592791)
## 2018-01-01 is not a date. Attempting to coerce...
## https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC&hfSea=2018%7C&hfSit=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&player_type=pitcher&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&pitchers_lookup%5B%5D=592791&game_date_gt=2018-01-01&game_date_lt=2019-05-03&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details
## These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved.
## Grabbing data, this may take a minute...
## URL read and payload acquired successfully.
taillon = filter.data(taillon.data)

Now, let’s just get some averages of Taillon’s pitches.

taillon.ff = taillon[taillon$pitch == "FF",]
taillon.ft = taillon[taillon$pitch == "FT",]
taillon.sl = taillon[taillon$pitch == "SL",]
taillon.cu = taillon[taillon$pitch == "CU",]
taillon.ch = taillon[taillon$pitch == "CH",]

taillon.ff = taillon.ff[complete.cases(taillon.ff),]
taillon.ft = taillon.ff[complete.cases(taillon.ft),]
taillon.sl = taillon.ff[complete.cases(taillon.sl),]
taillon.cu = taillon.ff[complete.cases(taillon.cu),]
taillon.ch = taillon.ff[complete.cases(taillon.ch),]
Pitch Average Velocity Standard Deviation of Velocity Average Spin Rate
4-Seam Fastball 95.4522176 1.0237036 2355.1823529
2-Seam Fastball NA NA NA
Slider NA NA NA
Curveball NA NA NA
Changeup 95.62225 0.936624 2356.7857143

Now let’s make some graphs.

taillon.heatmap.out = graph.pitch.heatmap.out(taillon)
taillon.heatmap.out

taillon.heatmap.type = graph.pitch.heatmap.type(taillon)
taillon.heatmap.type

taillon.heatmap.velo = graph.pitch.heatmap.velo(taillon)
taillon.heatmap.velo

taillon.spin = graph.pitch.spin(taillon)
taillon.spin

taillon.velo = graph.pitch.velo(taillon)
taillon.velo

taillon.xmove = graph.pitch.xmovement(taillon)
taillon.xmove

taillon.ymove = graph.pitch.ymovement(taillon)
taillon.ymove

Richard Rodriguez

rodriguez.data = scrape.data("2018-01-01", 593144)
## 2018-01-01 is not a date. Attempting to coerce...
## https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC&hfSea=2018%7C&hfSit=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&player_type=pitcher&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&pitchers_lookup%5B%5D=593144&game_date_gt=2018-01-01&game_date_lt=2019-05-03&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details
## These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved.
## Grabbing data, this may take a minute...
## URL read and payload acquired successfully.
rodriguez = filter.data(rodriguez.data)

Joe Musgrove

musgrove.data = scrape.data("2018-01-01", 605397)
## 2018-01-01 is not a date. Attempting to coerce...
## https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC&hfSea=2018%7C&hfSit=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&player_type=pitcher&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&pitchers_lookup%5B%5D=605397&game_date_gt=2018-01-01&game_date_lt=2019-05-03&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details
## These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved.
## Grabbing data, this may take a minute...
## URL read and payload acquired successfully.
musgrove = filter.data(musgrove.data)

Jordan Lyles

lyles.data = scrape.data("2018-01-01", 543475)
## 2018-01-01 is not a date. Attempting to coerce...
## https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC&hfSea=2018%7C&hfSit=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&player_type=pitcher&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&pitchers_lookup%5B%5D=543475&game_date_gt=2018-01-01&game_date_lt=2019-05-03&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details
## These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved.
## Grabbing data, this may take a minute...
## URL read and payload acquired successfully.
lyles = filter.data(lyles.data)

Kyle Crick

crick.data = scrape.data("2018-01-01", 605195)
## 2018-01-01 is not a date. Attempting to coerce...
## https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC&hfSea=2018%7C&hfSit=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&player_type=pitcher&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&pitchers_lookup%5B%5D=605195&game_date_gt=2018-01-01&game_date_lt=2019-05-03&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details
## These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved.
## Grabbing data, this may take a minute...
## URL read and payload acquired successfully.
crick = filter.data(crick.data)